Local Binary Pattern for Word Spotting in Handwritten Historical Document
نویسندگان
چکیده
Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally we compare the results of our proposed retrieval method with the other methods in the literature.
منابع مشابه
Connected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملRadial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting
Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...
متن کاملA probabilistic method for keyword retrieval in handwritten document images
Keyword retrieval in handwritten document images (word spotting) is very challenging given that OCR accuracy is not yet adequate for handwritten scripts, specially with large lexicons. Various proposed approaches build indices on information such as image features or OCR scores and have improved the performance of the traditional approach that builds index on OCR’ed text. In this paper, we impr...
متن کاملEfficient segmentation-free keyword spotting in historical document collections
In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-byexample paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the Latent Semantic Analysis techn...
متن کاملHandwritten Word Spotting in Old Manuscript Images Using a Pseudo-structural Descriptor Organized in a Hash Structure
There are lots of historical handwritten documents with information that can be used for several studies and projects. The Document Image Analysis and Recognition community is interested in preserving these documents and extracting all the valuable information from them. Handwritten word-spotting is the pattern classification task which consists in detecting handwriting word images. In this wor...
متن کامل